home *** CD-ROM | disk | FTP | other *** search
Wrap
Text File | 1995-07-26 | 78.3 KB | 1,452 lines
GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) NNNNAAAAMMMMEEEE gawk - pattern scanning and processing language SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS ggggaaaawwwwkkkk [ POSIX or GNU style options ] ----ffff _p_r_o_g_r_a_m-_f_i_l_e [ -------- ] file ... ggggaaaawwwwkkkk [ POSIX or GNU style options ] [ -------- ] _p_r_o_g_r_a_m-_t_e_x_t file ... DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN _G_a_w_k is the GNU Project's implementation of the AWK programming language. It conforms to the definition of the language in the POSIX 1003.2 Command Language And Utilities Standard. This version in turn is based on the description in _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, by Aho, Kernighan, and Weinberger, with the additional features defined in the System V Release 4 version of UNIX _a_w_k. _G_a_w_k also provides some GNU-specific extensions. The command line consists of options to _g_a_w_k itself, the AWK program text (if not supplied via the ----ffff or --------ffffiiiilllleeee options), and values to be made available in the AAAARRRRGGGGCCCC and AAAARRRRGGGGVVVV pre- defined AWK variables. OOOOPPPPTTTTIIIIOOOONNNNSSSS _G_a_w_k options may be either the traditional POSIX one letter options, or the GNU style long options. POSIX style options start with a single ``-'', while GNU long options start with ``--''. GNU style long options are provided for both GNU- specific features and for POSIX mandated features. Other implementations of the AWK language are likely to only accept the traditional one letter options. Following the POSIX standard, _g_a_w_k-specific options are supplied via arguments to the ----WWWW option. Multiple ----WWWW options may be supplied, or multiple arguments may be supplied together if they are separated by commas, or enclosed in quotes and separated by white space. Case is ignored in arguments to the ----WWWW option. Each ----WWWW option has a corresponding GNU style long option, as detailed below. Arguments to GNU style long options are either joined with the option by an ==== sign, with no intervening spaces, or they may be provided in the next command line argument. _G_a_w_k accepts the following options. ----FFFF _f_s --------ffffiiiieeeelllldddd----sssseeeeppppaaaarrrraaaattttoooorrrr====_f_s Use _f_s for the input field separator (the value of the FFFFSSSS predefined variable). ----vvvv _v_a_r====_v_a_l Page 1 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) --------aaaassssssssiiiiggggnnnn====_v_a_r====_v_a_l Assign the value _v_a_l, to the variable _v_a_r, before execution of the program begins. Such variable values are available to the BBBBEEEEGGGGIIIINNNN block of an AWK program. ----ffff _p_r_o_g_r_a_m-_f_i_l_e --------ffffiiiilllleeee====_p_r_o_g_r_a_m-_f_i_l_e Read the AWK program source from the file _p_r_o_g_r_a_m-_f_i_l_e, instead of from the first command line argument. Multiple ----ffff (or --------ffffiiiilllleeee) options may be used. ----mmmmffff====_N_N_N ----mmmmrrrr====_N_N_N Set various memory limits to the value _N_N_N. The ffff flag sets the maximum number of fields, and the rrrr flag sets the maximum record size. These two flags and the ----mmmm option are from the AT&T Bell Labs research version of UNIX _a_w_k. They are ignored by _g_a_w_k, since _g_a_w_k has no pre-defined limits. ----WWWW ccccoooommmmppppaaaatttt --------ccccoooommmmppppaaaatttt Run in _c_o_m_p_a_t_i_b_i_l_i_t_y mode. In compatibility mode, _g_a_w_k behaves identically to UNIX _a_w_k; none of the GNU-specific extensions are recognized. See GGGGNNNNUUUU EEEEXXXXTTTTEEEENNNNSSSSIIIIOOOONNNNSSSS, below, for more information. ----WWWW ccccooooppppyyyylllleeeefffftttt ----WWWW ccccooooppppyyyyrrrriiiigggghhhhtttt --------ccccooooppppyyyylllleeeefffftttt --------ccccooooppppyyyyrrrriiiigggghhhhtttt Print the short version of the GNU copyright information message on the error output. ----WWWW hhhheeeellllpppp ----WWWW uuuussssaaaaggggeeee --------hhhheeeellllpppp --------uuuussssaaaaggggeeee Print a relatively short summary of the available options on the error output. Per the GNU Coding Standards, these options cause an immediate, successful exit. ----WWWW lllliiiinnnntttt --------lllliiiinnnntttt Provide warnings about constructs that are dubious or non-portable to other AWK implementations. ----WWWW ppppoooossssiiiixxxx --------ppppoooossssiiiixxxx This turns on _c_o_m_p_a_t_i_b_i_l_i_t_y mode, with the following additional restrictions: o+ \\\\xxxx escape sequences are not recognized. o+ The synonym ffffuuuunnnncccc for the keyword ffffuuuunnnnccccttttiiiioooonnnn is not recognized. Page 2 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) o+ The operators ******** and ********==== cannot be used in place of ^^^^ and ^^^^====. ----WWWW ssssoooouuuurrrrcccceeee====_p_r_o_g_r_a_m-_t_e_x_t --------ssssoooouuuurrrrcccceeee====_p_r_o_g_r_a_m-_t_e_x_t Use _p_r_o_g_r_a_m-_t_e_x_t as AWK program source code. This option allows the easy intermixing of library functions (used via the ----ffff and --------ffffiiiilllleeee options) with source code entered on the command line. It is intended primarily for medium to large size AWK programs used in shell scripts. The ----WWWW ssssoooouuuurrrrcccceeee==== form of this option uses the rest of the command line argument for _p_r_o_g_r_a_m-_t_e_x_t; no other options to ----WWWW will be recognized in the same argument. ----WWWW vvvveeeerrrrssssiiiioooonnnn --------vvvveeeerrrrssssiiiioooonnnn Print version information for this particular copy of _g_a_w_k on the error output. This is useful mainly for knowing if the current copy of _g_a_w_k on your system is up to date with respect to whatever the Free Software Foundation is distributing. Per the GNU Coding Standards, these options cause an immediate, successful exit. -------- Signal the end of options. This is useful to allow further arguments to the AWK program itself to start with a ``-''. This is mainly for consistency with the argument parsing convention used by most other POSIX programs. In compatibility mode, any other options are flagged as illegal, but are otherwise ignored. In normal operation, as long as program text has been supplied, unknown options are passed on to the AWK program in the AAAARRRRGGGGVVVV array for processing. This is particularly useful for running AWK programs via the ``#!'' executable interpreter mechanism. AAAAWWWWKKKK PPPPRRRROOOOGGGGRRRRAAAAMMMM EEEEXXXXEEEECCCCUUUUTTTTIIIIOOOONNNN An AWK program consists of a sequence of pattern-action statements and optional function definitions. _p_a_t_t_e_r_n {{{{ _a_c_t_i_o_n _s_t_a_t_e_m_e_n_t_s }}}} ffffuuuunnnnccccttttiiiioooonnnn _n_a_m_e((((_p_a_r_a_m_e_t_e_r _l_i_s_t)))) {{{{ _s_t_a_t_e_m_e_n_t_s }}}} _G_a_w_k first reads the program source from the _p_r_o_g_r_a_m-_f_i_l_e(s) if specified, from arguments to ----WWWW ssssoooouuuurrrrcccceeee====, or from the first non-option argument on the command line. The ----ffff and ----WWWW ssssoooouuuurrrrcccceeee==== options may be used multiple times on the command line. _G_a_w_k will read the program text as if all the _p_r_o_g_r_a_m-_f_i_l_es and command line source texts had been Page 3 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) concatenated together. This is useful for building libraries of AWK functions, without having to include them in each new AWK program that uses them. It also provides the ability to mix library functions with command line programs. The environment variable AAAAWWWWKKKKPPPPAAAATTTTHHHH specifies a search path to use when finding source files named with the ----ffff option. If this variable does not exist, the default path is """"....::::////uuuussssrrrr////lllliiiibbbb////aaaawwwwkkkk::::////uuuussssrrrr////sssskkkkuuuunnnnkkkk////lllliiiibbbb////aaaawwwwkkkk"""". If a file name given to the ----ffff option contains a ``/'' character, no path search is performed. _G_a_w_k executes AWK programs in the following order. First, all variable assignments specified via the ----vvvv option are performed. Next, _g_a_w_k compiles the program into an internal form. Then, _g_a_w_k executes the code in the BBBBEEEEGGGGIIIINNNN block(s) (if any), and then proceeds to read each file named in the AAAARRRRGGGGVVVV array. If there are no files named on the command line, _g_a_w_k reads the standard input. If a filename on the command line has the form _v_a_r====_v_a_l it is treated as a variable assignment. The variable _v_a_r will be assigned the value _v_a_l. (This happens after any BBBBEEEEGGGGIIIINNNN block(s) have been run.) Command line variable assignment is most useful for dynamically assigning values to the variables AWK uses to control how input is broken into fields and records. It is also useful for controlling state if multiple passes are needed over a single data file. If the value of a particular element of AAAARRRRGGGGVVVV is empty (""""""""), _g_a_w_k skips over it. For each line in the input, _g_a_w_k tests to see if it matches any _p_a_t_t_e_r_n in the AWK program. For each pattern that the line matches, the associated _a_c_t_i_o_n is executed. The patterns are tested in the order they occur in the program. Finally, after all the input is exhausted, _g_a_w_k executes the code in the EEEENNNNDDDD block(s) (if any). VVVVAAAARRRRIIIIAAAABBBBLLLLEEEESSSS AAAANNNNDDDD FFFFIIIIEEEELLLLDDDDSSSS AWK variables are dynamic; they come into existence when they are first used. Their values are either floating-point numbers or strings, or both, depending upon how they are used. AWK also has one dimensional arrays; arrays with multiple dimensions may be simulated. Several pre-defined variables are set as a program runs; these will be described as needed and summarized below. FFFFiiiieeeellllddddssss As each input line is read, _g_a_w_k splits the line into Page 4 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) _f_i_e_l_d_s, using the value of the FFFFSSSS variable as the field separator. If FFFFSSSS is a single character, fields are separated by that character. Otherwise, FFFFSSSS is expected to be a full regular expression. In the special case that FFFFSSSS is a single blank, fields are separated by runs of blanks and/or tabs. Note that the value of IIIIGGGGNNNNOOOORRRREEEECCCCAAAASSSSEEEE (see below) will also affect how fields are split when FFFFSSSS is a regular expression. If the FFFFIIIIEEEELLLLDDDDWWWWIIIIDDDDTTTTHHHHSSSS variable is set to a space separated list of numbers, each field is expected to have fixed width, and _g_a_w_k will split up the record using the specified widths. The value of FFFFSSSS is ignored. Assigning a new value to FFFFSSSS overrides the use of FFFFIIIIEEEELLLLDDDDWWWWIIIIDDDDTTTTHHHHSSSS, and restores the default behavior. Each field in the input line may be referenced by its position, $$$$1111, $$$$2222, and so on. $$$$0000 is the whole line. The value of a field may be assigned to as well. Fields need not be referenced by constants: nnnn ==== 5555 pppprrrriiiinnnntttt $$$$nnnn prints the fifth field in the input line. The variable NNNNFFFF is set to the total number of fields in the input line. References to non-existent fields (i.e. fields after $$$$NNNNFFFF) produce the null-string. However, assigning to a non- existent field (e.g., $$$$((((NNNNFFFF++++2222)))) ==== 5555) will increase the value of NNNNFFFF, create any intervening fields with the null string as their value, and cause the value of $$$$0000 to be recomputed, with the fields being separated by the value of OOOOFFFFSSSS. References to negative numbered fields cause a fatal error. BBBBuuuuiiiilllltttt----iiiinnnn VVVVaaaarrrriiiiaaaabbbblllleeeessss AWK's built-in variables are: AAAARRRRGGGGCCCC The number of command line arguments (does not include options to _g_a_w_k, or the program source). AAAARRRRGGGGIIIINNNNDDDD The index in AAAARRRRGGGGVVVV of the current file being processed. AAAARRRRGGGGVVVV Array of command line arguments. The array is indexed from 0 to AAAARRRRGGGGCCCC - 1. Dynamically changing the contents of AAAARRRRGGGGVVVV can control the files used for data. CCCCOOOONNNNVVVVFFFFMMMMTTTT The conversion format for numbers, """"%%%%....6666gggg"""", by default. Page 5 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) EEEENNNNVVVVIIIIRRRROOOONNNN An array containing the values of the current environment. The array is indexed by the environment variables, each element being the value of that variable (e.g., EEEENNNNVVVVIIIIRRRROOOONNNN[[[[""""HHHHOOOOMMMMEEEE""""]]]] might be ////uuuu////aaaarrrrnnnnoooolllldddd). Changing this array does not affect the environment seen by programs which _g_a_w_k spawns via redirection or the ssssyyyysssstttteeeemmmm(((()))) function. (This may change in a future version of _g_a_w_k.) EEEERRRRRRRRNNNNOOOO If a system error occurs either doing a redirection for ggggeeeettttlllliiiinnnneeee, during a read for ggggeeeettttlllliiiinnnneeee, or during a cccclllloooosssseeee(((()))), then EEEERRRRRRRRNNNNOOOO will contain a string describing the error. FFFFIIIIEEEELLLLDDDDWWWWIIIIDDDDTTTTHHHHSSSS A white-space separated list of fieldwidths. When set, _g_a_w_k parses the input into fields of fixed width, instead of using the value of the FFFFSSSS variable as the field separator. The fixed field width facility is still experimental; expect the semantics to change as _g_a_w_k evolves over time. FFFFIIIILLLLEEEENNNNAAAAMMMMEEEE The name of the current input file. If no files are specified on the command line, the value of FFFFIIIILLLLEEEENNNNAAAAMMMMEEEE is ``-''. However, FFFFIIIILLLLEEEENNNNAAAAMMMMEEEE is undefined inside the BBBBEEEEGGGGIIIINNNN block. FFFFNNNNRRRR The input record number in the current input file. FFFFSSSS The input field separator, a blank by default. IIIIGGGGNNNNOOOORRRREEEECCCCAAAASSSSEEEE Controls the case-sensitivity of all regular expression operations. If IIIIGGGGNNNNOOOORRRREEEECCCCAAAASSSSEEEE has a non- zero value, then pattern matching in rules, field splitting with FFFFSSSS, regular expression matching with ~~~~ and !!!!~~~~, and the ggggssssuuuubbbb(((()))), iiiinnnnddddeeeexxxx(((()))), mmmmaaaattttcccchhhh(((()))), sssspppplllliiiitttt(((()))), and ssssuuuubbbb(((()))) pre-defined functions will all ignore case when doing regular expression operations. Thus, if IIIIGGGGNNNNOOOORRRREEEECCCCAAAASSSSEEEE is not equal to zero, ////aaaaBBBB//// matches all of the strings """"aaaabbbb"""", """"aaaaBBBB"""", """"AAAAbbbb"""", and """"AAAABBBB"""". As with all AWK variables, the initial value of IIIIGGGGNNNNOOOORRRREEEECCCCAAAASSSSEEEE is zero, so all regular expression operations are normally case-sensitive. NNNNFFFF The number of fields in the current input record. NNNNRRRR The total number of input records seen so far. Page 6 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) OOOOFFFFMMMMTTTT The output format for numbers, """"%%%%....6666gggg"""", by default. OOOOFFFFSSSS The output field separator, a blank by default. OOOORRRRSSSS The output record separator, by default a newline. RRRRSSSS The input record separator, by default a newline. RRRRSSSS is exceptional in that only the first character of its string value is used for separating records. (This will probably change in a future release of _g_a_w_k.) If RRRRSSSS is set to the null string, then records are separated by blank lines. When RRRRSSSS is set to the null string, then the newline character always acts as a field separator, in addition to whatever value FFFFSSSS may have. RRRRSSSSTTTTAAAARRRRTTTT The index of the first character matched by mmmmaaaattttcccchhhh(((()))); 0 if no match. RRRRLLLLEEEENNNNGGGGTTTTHHHH The length of the string matched by mmmmaaaattttcccchhhh(((()))); -1 if no match. SSSSUUUUBBBBSSSSEEEEPPPP The character used to separate multiple subscripts in array elements, by default """"\\\\000033334444"""". AAAArrrrrrrraaaayyyyssss Arrays are subscripted with an expression between square brackets ([[[[ and ]]]]). If the expression is an expression list (_e_x_p_r, _e_x_p_r ...) then the array subscript is a string consisting of the concatenation of the (string) value of each expression, separated by the value of the SSSSUUUUBBBBSSSSEEEEPPPP variable. This facility is used to simulate multiply dimensioned arrays. For example: iiii ==== """"AAAA"""" ;;;; jjjj ==== """"BBBB"""" ;;;; kkkk ==== """"CCCC"""" xxxx[[[[iiii,,,, jjjj,,,, kkkk]]]] ==== """"hhhheeeelllllllloooo,,,, wwwwoooorrrrlllldddd\\\\nnnn"""" assigns the string """"hhhheeeelllllllloooo,,,, wwwwoooorrrrlllldddd\\\\nnnn"""" to the element of the array xxxx which is indexed by the string """"AAAA\\\\000033334444BBBB\\\\000033334444CCCC"""". All arrays in AWK are associative, i.e. indexed by string values. The special operator iiiinnnn may be used in an iiiiffff or wwwwhhhhiiiilllleeee statement to see if an array has an index consisting of a particular value. iiiiffff ((((vvvvaaaallll iiiinnnn aaaarrrrrrrraaaayyyy)))) pppprrrriiiinnnntttt aaaarrrrrrrraaaayyyy[[[[vvvvaaaallll]]]] Page 7 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) If the array has multiple subscripts, use ((((iiii,,,, jjjj)))) iiiinnnn aaaarrrrrrrraaaayyyy. The iiiinnnn construct may also be used in a ffffoooorrrr loop to iterate over all the elements of an array. An element may be deleted from an array using the ddddeeeelllleeeetttteeee statement. The ddddeeeelllleeeetttteeee statement may also be used to delete the entire contents of an array. VVVVaaaarrrriiiiaaaabbbblllleeee TTTTyyyyppppiiiinnnngggg AAAAnnnndddd CCCCoooonnnnvvvveeeerrrrssssiiiioooonnnn Variables and fields may be (floating point) numbers, or strings, or both. How the value of a variable is interpreted depends upon its context. If used in a numeric expression, it will be treated as a number, if used as a string it will be treated as a string. To force a variable to be treated as a number, add 0 to it; to force it to be treated as a string, concatenate it with the null string. When a string must be converted to a number, the conversion is accomplished using _a_t_o_f(3). A number is converted to a string by using the value of CCCCOOOONNNNVVVVFFFFMMMMTTTT as a format string for _s_p_r_i_n_t_f(3), with the numeric value of the variable as the argument. However, even though all numbers in AWK are floating-point, integral values are _a_l_w_a_y_s converted as integers. Thus, given CCCCOOOONNNNVVVVFFFFMMMMTTTT ==== """"%%%%2222....2222ffff"""" aaaa ==== 11112222 bbbb ==== aaaa """""""" the variable bbbb has a string value of """"11112222"""" and not """"11112222....00000000"""". _G_a_w_k performs comparisons as follows: If two variables are numeric, they are compared numerically. If one value is numeric and the other has a string value that is a ``numeric string,'' then comparisons are also done numerically. Otherwise, the numeric value is converted to a string and a string comparison is performed. Two strings are compared, of course, as strings. According to the POSIX standard, even if two strings are numeric strings, a numeric comparison is performed. However, this is clearly incorrect, and _g_a_w_k does not do this. Uninitialized variables have the numeric value 0 and the string value "" (the null, or empty, string). PPPPAAAATTTTTTTTEEEERRRRNNNNSSSS AAAANNNNDDDD AAAACCCCTTTTIIIIOOOONNNNSSSS AWK is a line oriented language. The pattern comes first, and then the action. Action statements are enclosed in {{{{ and }}}}. Either the pattern may be missing, or the action may be Page 8 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) missing, but, of course, not both. If the pattern is missing, the action will be executed for every single line of input. A missing action is equivalent to {{{{ pppprrrriiiinnnntttt }}}} which prints the entire line. Comments begin with the ``#'' character, and continue until the end of the line. Blank lines may be used to separate statements. Normally, a statement ends with a newline, however, this is not the case for lines ending in a ``,'', ``{'', ``?'', ``:'', ``&&'', or ``||''. Lines ending in ddddoooo or eeeellllsssseeee also have their statements automatically continued on the following line. In other cases, a line can be continued by ending it with a ``\'', in which case the newline will be ignored. Multiple statements may be put on one line by separating them with a ``;''. This applies to both the statements within the action part of a pattern-action pair (the usual case), and to the pattern-action statements themselves. PPPPaaaatttttttteeeerrrrnnnnssss AWK patterns may be one of the following: BBBBEEEEGGGGIIIINNNN EEEENNNNDDDD ////_r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n//// _r_e_l_a_t_i_o_n_a_l _e_x_p_r_e_s_s_i_o_n _p_a_t_t_e_r_n &&&&&&&& _p_a_t_t_e_r_n _p_a_t_t_e_r_n |||||||| _p_a_t_t_e_r_n _p_a_t_t_e_r_n ???? _p_a_t_t_e_r_n :::: _p_a_t_t_e_r_n ((((_p_a_t_t_e_r_n)))) !!!! _p_a_t_t_e_r_n _p_a_t_t_e_r_n_1,,,, _p_a_t_t_e_r_n_2 BBBBEEEEGGGGIIIINNNN and EEEENNNNDDDD are two special kinds of patterns which are not tested against the input. The action parts of all BBBBEEEEGGGGIIIINNNN patterns are merged as if all the statements had been written in a single BBBBEEEEGGGGIIIINNNN block. They are executed before any of the input is read. Similarly, all the EEEENNNNDDDD blocks are merged, and executed when all the input is exhausted (or when an eeeexxxxiiiitttt statement is executed). BBBBEEEEGGGGIIIINNNN and EEEENNNNDDDD patterns cannot be combined with other patterns in pattern expressions. BBBBEEEEGGGGIIIINNNN and EEEENNNNDDDD patterns cannot have missing action parts. For ////_r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n//// patterns, the associated statement is executed for each input line that matches the regular expression. Regular expressions are the same as those in _e_g_r_e_p(1), and are summarized below. Page 9 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) A _r_e_l_a_t_i_o_n_a_l _e_x_p_r_e_s_s_i_o_n may use any of the operators defined below in the section on actions. These generally test whether certain fields match certain regular expressions. The &&&&&&&&, ||||||||, and !!!! operators are logical AND, logical OR, and logical NOT, respectively, as in C. They do short-circuit evaluation, also as in C, and are used for combining more primitive pattern expressions. As in most languages, parentheses may be used to change the order of evaluation. The ????:::: operator is like the same operator in C. If the first pattern is true then the pattern used for testing is the second pattern, otherwise it is the third. Only one of the second and third patterns is evaluated. The _p_a_t_t_e_r_n_1,,,, _p_a_t_t_e_r_n_2 form of an expression is called a _r_a_n_g_e _p_a_t_t_e_r_n. It matches all input records starting with a line that matches _p_a_t_t_e_r_n_1, and continuing until a record that matches _p_a_t_t_e_r_n_2, inclusive. It does not combine with any other sort of pattern expression. RRRReeeegggguuuullllaaaarrrr EEEExxxxpppprrrreeeessssssssiiiioooonnnnssss Regular expressions are the extended kind found in _e_g_r_e_p. They are composed of characters as follows: _c matches the non-metacharacter _c. _\_c matches the literal character _c. .... matches any character except newline. ^^^^ matches the beginning of a line or a string. $$$$ matches the end of a line or a string. [[[[_a_b_c...]]]] character class, matches any of the characters _a_b_c.... [[[[^^^^_a_b_c...]]]] negated character class, matches any character except _a_b_c... and newline. _r_1||||_r_2 alternation: matches either _r_1 or _r_2. _r_1_r_2 concatenation: matches _r_1, and then _r_2. _r++++ matches one or more _r's. _r**** matches zero or more _r's. _r???? matches zero or one _r's. ((((_r)))) grouping: matches _r. Page 10 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) The escape sequences that are valid in string constants (see below) are also legal in regular expressions. AAAAccccttttiiiioooonnnnssss Action statements are enclosed in braces, {{{{ and }}}}. Action statements consist of the usual assignment, conditional, and looping statements found in most languages. The operators, control statements, and input/output statements available are patterned after those in C. OOOOppppeeeerrrraaaattttoooorrrrssss The operators in AWK, in order of increasing precedence, are ==== ++++==== ----==== ****==== ////==== %%%%==== ^^^^==== Assignment. Both absolute assignment ((((_v_a_r ==== _v_a_l_u_e)))) and operator-assignment (the other forms) are supported. ????:::: The C conditional expression. This has the form _e_x_p_r_1 ???? _e_x_p_r_2 :::: _e_x_p_r_3. If _e_x_p_r_1 is true, the value of the expression is _e_x_p_r_2, otherwise it is _e_x_p_r_3. Only one of _e_x_p_r_2 and _e_x_p_r_3 is evaluated. |||||||| Logical OR. &&&&&&&& Logical AND. ~~~~ !!!!~~~~ Regular expression match, negated match. NNNNOOOOTTTTEEEE:::: Do not use a constant regular expression (////ffffoooooooo////) on the left-hand side of a ~~~~ or !!!!~~~~. Only use one on the right-hand side. The expression ////ffffoooooooo//// ~~~~ _e_x_p has the same meaning as (((((((($$$$0000 ~~~~ ////ffffoooooooo////)))) ~~~~ _e_x_p)))). This is usually _n_o_t what was intended. <<<< >>>> <<<<==== >>>>==== !!!!==== ======== The regular relational operators. _b_l_a_n_k String concatenation. ++++ ---- Addition and subtraction. **** //// %%%% Multiplication, division, and modulus. ++++ ---- !!!! Unary plus, unary minus, and logical negation. ^^^^ Exponentiation (******** may also be used, and ********==== for the assignment operator). ++++++++ -------- Increment and decrement, both prefix and Page 11 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) postfix. $$$$ Field reference. CCCCoooonnnnttttrrrroooollll SSSSttttaaaatttteeeemmmmeeeennnnttttssss The control statements are as follows: iiiiffff ((((_c_o_n_d_i_t_i_o_n)))) _s_t_a_t_e_m_e_n_t [ eeeellllsssseeee _s_t_a_t_e_m_e_n_t ] wwwwhhhhiiiilllleeee ((((_c_o_n_d_i_t_i_o_n)))) _s_t_a_t_e_m_e_n_t ddddoooo _s_t_a_t_e_m_e_n_t wwwwhhhhiiiilllleeee ((((_c_o_n_d_i_t_i_o_n)))) ffffoooorrrr ((((_e_x_p_r_1;;;; _e_x_p_r_2;;;; _e_x_p_r_3)))) _s_t_a_t_e_m_e_n_t ffffoooorrrr ((((_v_a_r iiiinnnn _a_r_r_a_y)))) _s_t_a_t_e_m_e_n_t bbbbrrrreeeeaaaakkkk ccccoooonnnnttttiiiinnnnuuuueeee ddddeeeelllleeeetttteeee _a_r_r_a_y[[[[_i_n_d_e_x]]]] ddddeeeelllleeeetttteeee _a_r_r_a_y eeeexxxxiiiitttt [ _e_x_p_r_e_s_s_i_o_n ] {{{{ _s_t_a_t_e_m_e_n_t_s }}}} IIII////OOOO SSSSttttaaaatttteeeemmmmeeeennnnttttssss The input/output statements are as follows: cccclllloooosssseeee((((_f_i_l_e_n_a_m_e)))) Close file (or pipe, see below). ggggeeeettttlllliiiinnnneeee Set $$$$0000 from next input record; set NNNNFFFF, NNNNRRRR, FFFFNNNNRRRR. ggggeeeettttlllliiiinnnneeee <<<<_f_i_l_e Set $$$$0000 from next record of _f_i_l_e; set NNNNFFFF. ggggeeeettttlllliiiinnnneeee _v_a_r Set _v_a_r from next input record; set NNNNFFFF, FFFFNNNNRRRR. ggggeeeettttlllliiiinnnneeee _v_a_r <<<<_f_i_l_e Set _v_a_r from next record of _f_i_l_e. nnnneeeexxxxtttt Stop processing the current input record. The next input record is read and processing starts over with the first pattern in the AWK program. If the end of the input data is reached, the EEEENNNNDDDD block(s), if any, are executed. nnnneeeexxxxtttt ffffiiiilllleeee Stop processing the current input file. The next input record read comes from the next input file. FFFFIIIILLLLEEEENNNNAAAAMMMMEEEE is updated, FFFFNNNNRRRR is reset to 1, and processing starts over with the first pattern in the AWK program. If the end of the input data is reached, the EEEENNNNDDDD block(s), if any, are executed. Page 12 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) pppprrrriiiinnnntttt Prints the current record. pppprrrriiiinnnntttt _e_x_p_r-_l_i_s_t Prints expressions. Each expression is separated by the value of the OOOOFFFFSSSS variable. The output record is terminated with the value of the OOOORRRRSSSS variable. pppprrrriiiinnnntttt _e_x_p_r-_l_i_s_t >>>>_f_i_l_e Prints expressions on _f_i_l_e. Each expression is separated by the value of the OOOOFFFFSSSS variable. The output record is terminated with the value of the OOOORRRRSSSS variable. pppprrrriiiinnnnttttffff _f_m_t, _e_x_p_r-_l_i_s_t Format and print. pppprrrriiiinnnnttttffff _f_m_t, _e_x_p_r-_l_i_s_t >>>>_f_i_l_e Format and print on _f_i_l_e. ssssyyyysssstttteeeemmmm((((_c_m_d-_l_i_n_e)))) Execute the command _c_m_d-_l_i_n_e, and return the exit status. (This may not be available on non-POSIX systems.) Other input/output redirections are also allowed. For pppprrrriiiinnnntttt and pppprrrriiiinnnnttttffff, >>>>>>>>_f_i_l_e appends output to the _f_i_l_e, while |||| _c_o_m_m_a_n_d writes on a pipe. In a similar fashion, _c_o_m_m_a_n_d |||| ggggeeeettttlllliiiinnnneeee pipes into ggggeeeettttlllliiiinnnneeee. The ggggeeeettttlllliiiinnnneeee command will return 0 on end of file, and -1 on an error. TTTThhhheeee _p_r_i_n_t_f SSSSttttaaaatttteeeemmmmeeeennnntttt The AWK versions of the pppprrrriiiinnnnttttffff statement and sssspppprrrriiiinnnnttttffff(((()))) function (see below) accept the following conversion specification formats: %%%%cccc An ASCII character. If the argument used for %%%%cccc is numeric, it is treated as a character and printed. Otherwise, the argument is assumed to be a string, and the only first character of that string is printed. %%%%dddd A decimal number (the integer part). %%%%iiii Just like %%%%dddd. %%%%eeee A floating point number of the form [[[[----]]]]dddd....ddddddddddddddddddddddddEEEE[[[[++++----]]]]dddddddd. %%%%ffff A floating point number of the form [[[[----]]]]dddddddddddd....dddddddddddddddddddddddd. %%%%gggg Use eeee or ffff conversion, whichever is shorter, with nonsignificant zeros suppressed. %%%%oooo An unsigned octal number (again, an integer). Page 13 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) %%%%ssss A character string. %%%%xxxx An unsigned hexadecimal number (an integer). %%%%XXXX Like %%%%xxxx, but using AAAABBBBCCCCDDDDEEEEFFFF instead of aaaabbbbccccddddeeeeffff. %%%%%%%% A single %%%% character; no argument is converted. There are optional, additional parameters that may lie between the %%%% and the control letter: ---- The expression should be left-justified within its field. _w_i_d_t_h The field should be padded to this width. If the number has a leading zero, then the field will be padded with zeros. Otherwise it is padded with blanks. This applies even to the non-numeric output formats. ...._p_r_e_c A number indicating the maximum width of strings or digits to the right of the decimal point. The dynamic _w_i_d_t_h and _p_r_e_c capabilities of the ANSI C pppprrrriiiinnnnttttffff(((()))) routines are supported. A **** in place of either the wwwwiiiiddddtttthhhh or pppprrrreeeecccc specifications will cause their values to be taken from the argument list to pppprrrriiiinnnnttttffff or sssspppprrrriiiinnnnttttffff(((()))). SSSSppppeeeecccciiiiaaaallll FFFFiiiilllleeee NNNNaaaammmmeeeessss When doing I/O redirection from either pppprrrriiiinnnntttt or pppprrrriiiinnnnttttffff into a file, or via ggggeeeettttlllliiiinnnneeee from a file, _g_a_w_k recognizes certain special filenames internally. These filenames allow access to open file descriptors inherited from _g_a_w_k's parent process (usually the shell). Other special filenames provide access information about the running ggggaaaawwwwkkkk process. The filenames are: ////ddddeeeevvvv////ppppiiiidddd Reading this file returns the process ID of the current process, in decimal, terminated with a newline. ////ddddeeeevvvv////ppppppppiiiidddd Reading this file returns the parent process ID of the current process, in decimal, terminated with a newline. ////ddddeeeevvvv////ppppggggrrrrppppiiiidddd Reading this file returns the process group ID of the current process, in decimal, terminated with a newline. ////ddddeeeevvvv////uuuusssseeeerrrr Reading this file returns a single record terminated with a newline. The fields are Page 14 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) separated with blanks. $$$$1111 is the value of the _g_e_t_u_i_d(2) system call, $$$$2222 is the value of the _g_e_t_e_u_i_d(2) system call, $$$$3333 is the value of the _g_e_t_g_i_d(2) system call, and $$$$4444 is the value of the _g_e_t_e_g_i_d(2) system call. If there are any additional fields, they are the group IDs returned by _g_e_t_g_r_o_u_p_s(2). Multiple groups may not be supported on all systems. ////ddddeeeevvvv////ssssttttddddiiiinnnn The standard input. ////ddddeeeevvvv////ssssttttddddoooouuuutttt The standard output. ////ddddeeeevvvv////ssssttttddddeeeerrrrrrrr The standard error output. ////ddddeeeevvvv////ffffdddd////_n The file associated with the open file descriptor _n. These are particularly useful for error messages. For example: pppprrrriiiinnnntttt """"YYYYoooouuuu bbbblllleeeewwww iiiitttt!!!!"""" >>>> """"////ddddeeeevvvv////ssssttttddddeeeerrrrrrrr"""" whereas you would otherwise have to use pppprrrriiiinnnntttt """"YYYYoooouuuu bbbblllleeeewwww iiiitttt!!!!"""" |||| """"ccccaaaatttt 1111>>>>&&&&2222"""" These file names may also be used on the command line to name data files. NNNNuuuummmmeeeerrrriiiicccc FFFFuuuunnnnccccttttiiiioooonnnnssss AWK has the following pre-defined arithmetic functions: aaaattttaaaannnn2222((((_y,,,, _x)))) returns the arctangent of _y/_x in radians. ccccoooossss((((_e_x_p_r)))) returns the cosine in radians. eeeexxxxpppp((((_e_x_p_r)))) the exponential function. iiiinnnntttt((((_e_x_p_r)))) truncates to integer. lllloooogggg((((_e_x_p_r)))) the natural logarithm function. rrrraaaannnndddd(((()))) returns a random number between 0 and 1. ssssiiiinnnn((((_e_x_p_r)))) returns the sine in radians. ssssqqqqrrrrtttt((((_e_x_p_r)))) the square root function. ssssrrrraaaannnndddd((((_e_x_p_r)))) use _e_x_p_r as a new seed for the random number generator. If no _e_x_p_r is provided, the time of day will be used. The return value is the Page 15 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) previous seed for the random number generator. SSSSttttrrrriiiinnnngggg FFFFuuuunnnnccccttttiiiioooonnnnssss AWK has the following pre-defined string functions: ggggssssuuuubbbb((((_r,,,, _s,,,, _t)))) for each substring matching the regular expression _r in the string _t, substitute the string _s, and return the number of substitutions. If _t is not supplied, use $$$$0000. iiiinnnnddddeeeexxxx((((_s,,,, _t)))) returns the index of the string _t in the string _s, or 0 if _t is not present. lllleeeennnnggggtttthhhh((((_s)))) returns the length of the string _s, or the length of $$$$0000 if _s is not supplied. mmmmaaaattttcccchhhh((((_s,,,, _r)))) returns the position in _s where the regular expression _r occurs, or 0 if _r is not present, and sets the values of RRRRSSSSTTTTAAAARRRRTTTT and RRRRLLLLEEEENNNNGGGGTTTTHHHH. sssspppplllliiiitttt((((_s,,,, _a,,,, _r)))) splits the string _s into the array _a on the regular expression _r, and returns the number of fields. If _r is omitted, FFFFSSSS is used instead. The array _a is cleared first. sssspppprrrriiiinnnnttttffff((((_f_m_t,,,, _e_x_p_r-_l_i_s_t)))) prints _e_x_p_r-_l_i_s_t according to _f_m_t, and returns the resulting string. ssssuuuubbbb((((_r,,,, _s,,,, _t)))) just like ggggssssuuuubbbb(((()))), but only the first matching substring is replaced. ssssuuuubbbbssssttttrrrr((((_s,,,, _i,,,, _n)))) returns the _n-character substring of _s starting at _i. If _n is omitted, the rest of _s is used. ttttoooolllloooowwwweeeerrrr((((_s_t_r)))) returns a copy of the string _s_t_r, with all the upper-case characters in _s_t_r translated to their corresponding lower-case counterparts. Non-alphabetic characters are left unchanged. ttttoooouuuuppppppppeeeerrrr((((_s_t_r)))) returns a copy of the string _s_t_r, with all the lower-case characters in _s_t_r translated to their corresponding upper-case counterparts. Non-alphabetic Page 16 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) characters are left unchanged. TTTTiiiimmmmeeee FFFFuuuunnnnccccttttiiiioooonnnnssss Since one of the primary uses of AWK programs is processing log files that contain time stamp information, _g_a_w_k provides the following two functions for obtaining time stamps and formatting them. ssssyyyyssssttttiiiimmmmeeee(((()))) returns the current time of day as the number of seconds since the Epoch (Midnight UTC, January 1, 1970 on POSIX systems). ssssttttrrrrffffttttiiiimmmmeeee((((_f_o_r_m_a_t, _t_i_m_e_s_t_a_m_p)))) formats _t_i_m_e_s_t_a_m_p according to the specification in _f_o_r_m_a_t. The _t_i_m_e_s_t_a_m_p should be of the same form as returned by ssssyyyyssssttttiiiimmmmeeee(((()))). If _t_i_m_e_s_t_a_m_p is missing, the current time of day is used. See the specification for the ssssttttrrrrffffttttiiiimmmmeeee(((()))) function in ANSI C for the format conversions that are guaranteed to be available. A public-domain version of _s_t_r_f_t_i_m_e(3) and a man page for it are shipped with _g_a_w_k; if that version was used to build _g_a_w_k, then all of the conversions described in that man page are available to _g_a_w_k. SSSSttttrrrriiiinnnngggg CCCCoooonnnnssssttttaaaannnnttttssss String constants in AWK are sequences of characters enclosed between double quotes (""""). Within strings, certain _e_s_c_a_p_e _s_e_q_u_e_n_c_e_s are recognized, as in C. These are: \\\\\\\\ A literal backslash. \\\\aaaa The ``alert'' character; usually the ASCII BEL character. \\\\bbbb backspace. \\\\ffff form-feed. \\\\nnnn new line. \\\\rrrr carriage return. \\\\tttt horizontal tab. \\\\vvvv vertical tab. \\\\xxxx_h_e_x _d_i_g_i_t_s The character represented by the string of hexadecimal digits following the \\\\xxxx. As in ANSI C, all following hexadecimal digits are considered part of the escape sequence. (This feature should tell us something about Page 17 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) language design by committee.) E.g., """"\\\\xxxx1111BBBB"""" is the ASCII ESC (escape) character. \\\\_d_d_d The character represented by the 1-, 2-, or 3-digit sequence of octal digits. E.g. """"\\\\000033333333"""" is the ASCII ESC (escape) character. \\\\_c The literal character _c. The escape sequences may also be used inside constant regular expressions (e.g., ////[[[[ \\\\tttt\\\\ffff\\\\nnnn\\\\rrrr\\\\vvvv]]]]//// matches whitespace characters). FFFFUUUUNNNNCCCCTTTTIIIIOOOONNNNSSSS Functions in AWK are defined as follows: ffffuuuunnnnccccttttiiiioooonnnn _n_a_m_e((((_p_a_r_a_m_e_t_e_r _l_i_s_t)))) {{{{ _s_t_a_t_e_m_e_n_t_s }}}} Functions are executed when called from within the action parts of regular pattern-action statements. Actual parameters supplied in the function call are used to instantiate the formal parameters declared in the function. Arrays are passed by reference, other variables are passed by value. Since functions were not originally part of the AWK language, the provision for local variables is rather clumsy: They are declared as extra parameters in the parameter list. The convention is to separate local variables from real parameters by extra spaces in the parameter list. For example: ffffuuuunnnnccccttttiiiioooonnnn ffff((((pppp,,,, qqqq,,,, aaaa,,,, bbbb)))) {{{{ #### aaaa &&&& bbbb aaaarrrreeee llllooooccccaaaallll .................... }}}} ////aaaabbbbcccc//// {{{{ ............ ;;;; ffff((((1111,,,, 2222)))) ;;;; ............ }}}} The left parenthesis in a function call is required to immediately follow the function name, without any intervening white space. This is to avoid a syntactic ambiguity with the concatenation operator. This restriction does not apply to the built-in functions listed above. Functions may call each other and may be recursive. Function parameters used as local variables are initialized to the null string and the number zero upon function invocation. The word ffffuuuunnnncccc may be used in place of ffffuuuunnnnccccttttiiiioooonnnn. EEEEXXXXAAAAMMMMPPPPLLLLEEEESSSS Print and sort the login names of all users: Page 18 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) BBBBEEEEGGGGIIIINNNN {{{{ FFFFSSSS ==== """"::::"""" }}}} {{{{ pppprrrriiiinnnntttt $$$$1111 |||| """"ssssoooorrrrtttt"""" }}}} Count lines in a file: {{{{ nnnnlllliiiinnnneeeessss++++++++ }}}} EEEENNNNDDDD {{{{ pppprrrriiiinnnntttt nnnnlllliiiinnnneeeessss }}}} Precede each line by its number in the file: {{{{ pppprrrriiiinnnntttt FFFFNNNNRRRR,,,, $$$$0000 }}}} Concatenate and line number (a variation on a theme): {{{{ pppprrrriiiinnnntttt NNNNRRRR,,,, $$$$0000 }}}} SSSSEEEEEEEE AAAALLLLSSSSOOOO _e_g_r_e_p(1), _g_e_t_p_i_d(2), _g_e_t_p_p_i_d(2), _g_e_t_p_g_r_p(2), _g_e_t_u_i_d(2), _g_e_t_e_u_i_d(2), _g_e_t_g_i_d(2), _g_e_t_e_g_i_d(2), _g_e_t_g_r_o_u_p_s(2) _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger, Addison-Wesley, 1988. ISBN 0-201-07981-X. _T_h_e _G_A_W_K _M_a_n_u_a_l, Edition 0.15, published by the Free Software Foundation, 1993. PPPPOOOOSSSSIIIIXXXX CCCCOOOOMMMMPPPPAAAATTTTIIIIBBBBIIIILLLLIIIITTTTYYYY A primary goal for _g_a_w_k is compatibility with the POSIX standard, as well as with the latest version of UNIX _a_w_k. To this end, _g_a_w_k incorporates the following user visible features which are not described in the AWK book, but are part of _a_w_k in System V Release 4, and are in the POSIX standard. The ----vvvv option for assigning variables before program execution starts is new. The book indicates that command line variable assignment happens when _a_w_k would otherwise open the argument as a file, which is after the BBBBEEEEGGGGIIIINNNN block is executed. However, in earlier implementations, when such an assignment appeared before any file names, the assignment would happen _b_e_f_o_r_e the BBBBEEEEGGGGIIIINNNN block was run. Applications came to depend on this ``feature.'' When _a_w_k was changed to match its documentation, this option was added to accommodate applications that depended upon the old behavior. (This feature was agreed upon by both the AT&T and GNU developers.) The ----WWWW option for implementation specific features is from the POSIX standard. When processing arguments, _g_a_w_k uses the special option Page 19 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) ``--------'' to signal the end of arguments. In compatibility mode, it will warn about, but otherwise ignore, undefined options. In normal operation, such arguments are passed on to the AWK program for it to process. The AWK book does not define the return value of ssssrrrraaaannnndddd(((()))). The System V Release 4 version of UNIX _a_w_k (and the POSIX standard) has it return the seed it was using, to allow keeping track of random number sequences. Therefore ssssrrrraaaannnndddd(((()))) in _g_a_w_k also returns its current seed. Other new features are: The use of multiple ----ffff options (from MKS _a_w_k); the EEEENNNNVVVVIIIIRRRROOOONNNN array; the \\\\aaaa, and \\\\vvvv escape sequences (done originally in _g_a_w_k and fed back into AT&T's); the ttttoooolllloooowwwweeeerrrr(((()))) and ttttoooouuuuppppppppeeeerrrr(((()))) built-in functions (from AT&T); and the ANSI C conversion specifications in pppprrrriiiinnnnttttffff (done first in AT&T's version). GGGGNNNNUUUU EEEEXXXXTTTTEEEENNNNSSSSIIIIOOOONNNNSSSS _G_a_w_k has some extensions to POSIX _a_w_k. They are described in this section. All the extensions described here can be disabled by invoking _g_a_w_k with the ----WWWW ccccoooommmmppppaaaatttt option. The following features of _g_a_w_k are not available in POSIX _a_w_k. o+ The \\\\xxxx escape sequence. o+ The ssssyyyyssssttttiiiimmmmeeee(((()))) and ssssttttrrrrffffttttiiiimmmmeeee(((()))) functions. o+ The special file names available for I/O redirection are not recognized. o+ The AAAARRRRGGGGIIIINNNNDDDD and EEEERRRRRRRRNNNNOOOO variables are not special. o+ The IIIIGGGGNNNNOOOORRRREEEECCCCAAAASSSSEEEE variable and its side-effects are not available. o+ The FFFFIIIIEEEELLLLDDDDWWWWIIIIDDDDTTTTHHHHSSSS variable and fixed width field splitting. o+ No path search is performed for files named via the ----ffff option. Therefore the AAAAWWWWKKKKPPPPAAAATTTTHHHH environment variable is not special. o+ The use of nnnneeeexxxxtttt ffffiiiilllleeee to abandon processing of the current input file. o+ The use of ddddeeeelllleeeetttteeee _a_r_r_a_y to delete the entire contents of an array. The AWK book does not define the return value of the cccclllloooosssseeee(((()))) Page 20 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) function. _G_a_w_k's cccclllloooosssseeee(((()))) returns the value from _f_c_l_o_s_e(3), or _p_c_l_o_s_e(3), when closing a file or pipe, respectively. When _g_a_w_k is invoked with the ----WWWW ccccoooommmmppppaaaatttt option, if the _f_s argument to the ----FFFF option is ``t'', then FFFFSSSS will be set to the tab character. Since this is a rather ugly special case, it is not the default behavior. This behavior also does not occur if ----WWWW ppppoooossssiiiixxxx has been specified. HHHHIIIISSSSTTTTOOOORRRRIIIICCCCAAAALLLL FFFFEEEEAAAATTTTUUUURRRREEEESSSS There are two features of historical AWK implementations that _g_a_w_k supports. First, it is possible to call the lllleeeennnnggggtttthhhh(((()))) built-in function not only with no argument, but even without parentheses! Thus, aaaa ==== lllleeeennnnggggtttthhhh is the same as either of aaaa ==== lllleeeennnnggggtttthhhh(((()))) aaaa ==== lllleeeennnnggggtttthhhh(((($$$$0000)))) This feature is marked as ``deprecated'' in the POSIX standard, and _g_a_w_k will issue a warning about its use if ----WWWW lllliiiinnnntttt is specified on the command line. The other feature is the use of the ccccoooonnnnttttiiiinnnnuuuueeee statement outside the body of a wwwwhhhhiiiilllleeee, ffffoooorrrr, or ddddoooo loop. Traditional AWK implementations have treated such usage as equivalent to the nnnneeeexxxxtttt statement. _G_a_w_k will support this usage if ----WWWW ppppoooossssiiiixxxx has not been specified. EEEENNNNVVVVIIIIRRRROOOONNNNMMMMEEEENNNNTTTT VVVVAAAARRRRIIIIAAAABBBBLLLLEEEESSSS If PPPPOOOOSSSSIIIIXXXXLLLLYYYY____CCCCOOOORRRRRRRREEEECCCCTTTT exists in the environment, then _g_a_w_k behaves exactly as if --------ppppoooossssiiiixxxx had been specified on the command line. If --------lllliiiinnnntttt has been specified, _g_a_w_k will issue a warning message to this effect. BBBBUUUUGGGGSSSS The ----FFFF option is not necessary given the command line variable assignment feature; it remains only for backwards compatibility. If your system actually has support for ////ddddeeeevvvv////ffffdddd and the associated ////ddddeeeevvvv////ssssttttddddiiiinnnn, ////ddddeeeevvvv////ssssttttddddoooouuuutttt, and ////ddddeeeevvvv////ssssttttddddeeeerrrrrrrr files, you may get different output from _g_a_w_k than you would get on a system without those files. When _g_a_w_k interprets these files internally, it synchronizes output to the standard output with output to ////ddddeeeevvvv////ssssttttddddoooouuuutttt, while on a system with those files, the output is actually to different open files. Caveat Emptor. Page 21 (printed 7/11/94) GGGGAAAAWWWWKKKK((((1111)))) FFFFrrrreeeeeeee SSSSooooffffttttwwwwaaaarrrreeee FFFFoooouuuunnnnddddaaaattttiiiioooonnnn ((((AAAApppprrrr 11118888 1111999999994444)))) GGGGAAAAWWWWKKKK((((1111)))) VVVVEEEERRRRSSSSIIIIOOOONNNN IIIINNNNFFFFOOOORRRRMMMMAAAATTTTIIIIOOOONNNN This man page documents _g_a_w_k, version 2.15. Starting with the 2.15 version of _g_a_w_k, the ----cccc, ----VVVV, ----CCCC, ----aaaa, and ----eeee options of the 2.11 version are no longer recognized. This fact will not even be documented in the manual page for version 2.16. AAAAUUUUTTTTHHHHOOOORRRRSSSS The original version of UNIX _a_w_k was designed and implemented by Alfred Aho, Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan continues to maintain and enhance it. Paul Rubin and Jay Fenlason, of the Free Software Foundation, wrote _g_a_w_k, to be compatible with the original version of _a_w_k distributed in Seventh Edition UNIX. John Woods contributed a number of bug fixes. David Trueman, with contributions from Arnold Robbins, made _g_a_w_k compatible with the new version of UNIX _a_w_k. The initial DOS port was done by Conrad Kwok and Scott Garfinkle. Scott Deifik is the current DOS maintainer. Pat Rankin did the port to VMS, and Michal Jaegermann did the port to the Atari ST. The port to OS/2 was done by Kai Uwe Rommel, with contributions and help from Darrel Hankerson. AAAACCCCKKKKNNNNOOOOWWWWLLLLEEEEDDDDGGGGEEEEMMMMEEEENNNNTTTTSSSS Brian Kernighan of Bell Labs provided valuable assistance during testing and debugging. We thank him. Page 22 (printed 7/11/94)